Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

نویسندگان

  • George C. Linderman
  • Gal Mishne
  • Yuval Kluger
  • Stefan Steinerberger
چکیده

If we pick n random points uniformly in [0, 1]d and connect each point to its k−nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in [0, 1]d it suffices to connect every point to cd,1 log logn points chosen randomly among its cd,2 logn−nearest neighbors to ensure a giant component of size n− o(n) with high probability. This construction yields a much sparser random graph with ∼ n log logn instead of ∼ n logn edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the k−nearest neighbors, one can often pick k′ k random points out of the k−nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph-based time-space trade-offs for approximate near neighbors

We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size n = 2o(d) on the d-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approx...

متن کامل

A Critical Point for Random Graphs with a Given Degree Sequence

Given a sequence of non-negative real numbers 0 ; 1 ; : : : which sum to 1, we consider random graphs having approximately i n ver-tices of degree i. Essentially, we show that if P i(i?2) i > 0 then such graphs almost surely have a giant component, while if P i(i?2) i < 0 then almost surely all components in such graphs are small. We can apply these results to G n;p ; G n;M , and other well-kno...

متن کامل

Non-zero probability of nearest neighbor searching

Nearest Neighbor (NN) searching is a challenging problem in data management and has been widely studied in data mining, pattern recognition and computational geometry. The goal of NN searching is efficiently reporting the nearest data to a given object as a query. In most of the studies both the data and query are assumed to be precise, however, due to the real applications of NN searching, suc...

متن کامل

Hierarchical neighbor graphs: A fully distributed topology for data collection in wireless sensor networks

We introduce hierarchical neighbor graphs, a new topology control mechanism for wireless sensor networks. This mechanism is a randomized one that takes a single parameter, 0 < p < 1, and uses it to build a structure that has the flavor of hierarchical clustering and is fully distributed in the sense that it requires only local knowledge at each node to be formed and repaired, and moreover requi...

متن کامل

A Survey on Complexity of Integrity Parameter

Many graph theoretical parameters have been used to describe the vulnerability of communication networks, including toughness, binding number, rate of disruption, neighbor-connectivity, integrity, mean integrity, edgeconnectivity vector, l-connectivity and tenacity. In this paper we discuss Integrity and its properties in vulnerability calculation. The integrity of a graph G, I(G), is defined t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.04712  شماره 

صفحات  -

تاریخ انتشار 2017